Skip to content

PERF: trim import time ~5% #28227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 5, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

by avoiding/lazifying expensive stdlib imports.

Exact timings are tough because repeated runs of python3 -X importtime -c "import pandas as pd" are really high variance, but my general read is that this trims about 35 ms out of about 650 ms.

@simonjayhawkins simonjayhawkins added the Performance Memory or execution speed performance label Aug 29, 2019
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add these modules to the blacklist in ci/code_checks.sh?

Lazy-import wrapper for stdlib urlopen, as that imports a big chunk of
the stdlib.
"""
from urllib.request import urlopen as _urlopen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention is trailing underscores to prevent clashes.

Or,

import urllib.request

return urllib.request.urlopen(*args, **kwargs)

@WillAyd
Copy link
Member

WillAyd commented Aug 30, 2019

Hmm so if we only expect a 5% improvement in import time I am actually -1 on this change. I don't think its worth adding runtime logic for a small optimization

@jbrockmendel
Copy link
Member Author

Can you add these modules to the blacklist in ci/code_checks.sh?

Looks like it checks for e.g. "urllib" as opposed to "urllib.request". Since we also import urllib.parse, we can't exclude that. Excluding "http" works though.

@jbrockmendel
Copy link
Member Author

so if we only expect a 5% improvement in import time

Some context on that number:

  • PERF: lazify pytz seqToRE call, trims 35ms from import #28228 gets another 5% or so
  • numpy master (but not 1.17) makes numpy.testing import lazy, trimming about 15% from their import time, so about 3% of ours
  • IIRC before long we'll stop importing matplotlib at import-time, which could trim our time by up to 20% (based on current timing of pandas.plotting._matplotlib import)

The 5% we get here becomes a bigger deal as those get whittled down

@jreback
Copy link
Contributor

jreback commented Aug 30, 2019

trimming import time is always good; sure it sometimes makes the code a bit more complex but +1 here

@topper-123
Copy link
Contributor

I'm ok with this, especially on the aggregate benefits, but suspect it will be brittle, as someone could in the future (unaware) undo this optimization, so we only get left with the complexity, and no benefit. Maybe add tests for the content of sys.modules in a seperate CI run?

@WillAyd
Copy link
Member

WillAyd commented Aug 30, 2019

Does the code check added not catch that?

@WillAyd
Copy link
Member

WillAyd commented Aug 30, 2019

Ah nevermind - I see that is only for the http module not urllib

@topper-123
Copy link
Contributor

topper-123 commented Aug 30, 2019

Im also thinking more general, imports in imported modules etc.

@TomAugspurger
Copy link
Contributor

We could modify the check. I don't know why I originally wrote it to just check the package.

@jbrockmendel
Copy link
Member Author

edited code check

@topper-123
Copy link
Contributor

Wouldn't it make sense to have a whitelist instead of a blacklist? A blacklist will allow a lot of things that we might not really want when importing pandas.

@jbrockmendel
Copy link
Member Author

Let's leave the idea of a whitelist changeover for a separate discussion. Other than that, I think everything has been addressed here.

@jreback jreback added this to the 1.0 milestone Sep 2, 2019
@TomAugspurger
Copy link
Contributor

A merge of master should fix CI.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging later today.

@jreback
Copy link
Contributor

jreback commented Sep 4, 2019

can you rebase

@jbrockmendel
Copy link
Member Author

Rebased +green

@TomAugspurger TomAugspurger merged commit 04e67c4 into pandas-dev:master Sep 5, 2019
@TomAugspurger
Copy link
Contributor

Thanks.

@jbrockmendel jbrockmendel deleted the iospeed branch September 5, 2019 16:02
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
* PERF: trim import time ~5% with lazy imports
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
* PERF: trim import time ~5% with lazy imports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants